Simple Sampling Techniques for Discovery Science

نویسنده

  • Osamu Watanabe
چکیده

We explain three random sampling techniques that are simple but widely applicable for various problems involving huge data sets. The first technique is an immediate application of large deviation bounds. The second and the third ones are sequential sampling or adaptive sampling techniques. We fix one simple problem and explain these techniques by demonstrating algorithms for this problem and discussing their correctness and efficiency. key words: random sampling, the Chernoff bound, the Hoeffding bound, the Central Limit Theorem, sequential sampling, adaptive sampling

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Science Contribute to Knowledge Discovery ?

Knowledge discovery, that is, to analyze a given massive data set and derive or discover some knowledge from it, has been becoming a quite important subject in several fields including computer science. Good softwares have been demanded for various knowledge discovery tasks. For such softwares, we often need to develop efficient algorithms for handling huge data sets. Random sampling is one of ...

متن کامل

How Can Computer Science Contribute to Knowledge Discovery?

Knowledge discovery, that is, to analyze a given massive data set and derive or discover some knowledge from it, has been becoming a quite important subject in several fields including computer science. Good softwares have been demanded for various knowledge discovery tasks. For such softwares, we often need to develop efficient algorithms for handling huge data sets. Random sampling is one of ...

متن کامل

Practical Algorithms for On-line Sampling

One of the core applications of machine learning to knowledge discovery consists on building a function (a hypothesis) from a given amount of data (for instance a decision tree or a neural network) such that we can use it afterwards to predict new instances of the data. In this paper, we focus on a particular situation where we assume that the hypothesis we want to use for prediction is very si...

متن کامل

Importance sampling the Rayleigh phase function.

Rayleigh scattering is used frequently in Monte Carlo simulation of multiple scattering. The Rayleigh phase function is quite simple, and one might expect that it should be simple to importance sample it efficiently. However, there seems to be no one good way of sampling it in the literature. This paper provides the details of several different techniques for importance sampling the Rayleigh ph...

متن کامل

بررسی کاربردهای داده کاوی در نظام سلامت

Introduction: Extensive amounts of data stored in medical databases require the development of specialized tools for accessing the data, data analysis, knowledge discovery, and the effective use of the data. Data mining is one of the most important methods. The article sketches the used Data Mining techniques, and illustrates their applicability to medical diagnostic and prognostic problems. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999